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Single event upsets (SEUs) induced by heavy ions were observed in 65 nm SRAMs to quantitatively evaluate 
the applicability and effectiveness of single-bit error correcting code (ECC) utilizing Hamming Code. The 
results show that the ECC did improve the performance dramatically, with the SEU cross sections of SRAMs 
with ECC being at the order of 107+ cm?/bit, two orders of magnitude higher than that without ECC (at 
the order of 107° cm?/bit). Also, ineffectiveness of ECC module, including 1-, 2- and 3-bits errors in single 
word (not Multiple Bit Upsets), was detected. The ECC modules in SRAMs utilizing (12, 8) Hamming code 
would lose work when 2-bits upset accumulates in one codeword. Finally, the probabilities of failure modes 
involving 1-, 2- and 3-bits errors, were calcaulated at 39.39%, 37.88% and 22.73%, respectively, which agree 


well with the experimental results. 
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I. INTRODUCTION 


As technology scales downward in modern integrated cir- 
cuits, such as SRAM, the minimum charge needed to upset 
a device within a unit memory cell decreases, while the in- 
fluence of charge sharing on adjacent unit memory cells in- 
creases [1-5]. Therefore, advanced devices (especially deep- 
submicrometer) are much more sensitive to the energy deposi- 
tion in the device by heavy ion irradiation, and this critically 
restricts the devices’ use in space. 

Many methods have been proposed to mitigate the single 
event upsets (SEUs) occurred in advanced devices. Bits inter- 
leaving architecture is a commonly accepted approach to mit- 
igate Multiple Bit Upsets (MBUs) in data word. In this archi- 
tecture, the bits in a data word are not physically adjacent, but 
interleaved with bits of other data words. In this way, every 
MBU of physically adjacent memory cells is transformed into 
multiple single bit upsets (SBUs) in different memory words. 

Error correcting code (ECC) utilizing Hamming code is 
found commonly in many high-reliability and performance ap- 
plications. As a relatively simple yet powerful ECC code, it 
corrects single bit errors anywhere within the codeword. 

Therefore, MBUs which is now a major reliability problem 
in commercial and industrial electronics, can be transformed 
into multiple SBUs appear to be uncorrelated events relative to 
the ECC algorithm, and then be corrected [2, 5-7]. 

In this hardening approach, ECC module can be used in 
high-reliability and performance applications to resolve SBUs 
combining with the bits interleaving architecture in advanced 
process node devices. 


* Supported by the National Natural Science Foundation of China (Nos. 
11079045 and 11179003) and the Important Direction Project of the CAS 
Knowledge Innovation Program (No.KJCX2-YW-N27) 

+ Corresponding author, suhong @impcas.ac.cn 


To observe and compare the SEUs induced by heavy ions 
in SRAMs of different process, and to quantitatively evaluate 
the applicability and effectiveness of single-bit ECC utilizing 
Hamming code in advanced process SRAMs, we used !7C ion 
beam to irradiate four SRAMs from ISSI company. Two of 
them, manufactured via 130nm and 150nm process, are the 
most advanced process devices in their SRAMs without ECC 
module, while the other two are of 65 nm process SRAMs with 
ECC module. Some interesting results were obtained. 


II. EXPERIMENTAL BACKGROUND 


Four industrial SRAMs, produced by high-performance 
CMOS technology, were irradiated at normal incidence in the 
vacuum by 1°C beams from the Heavy Ion Research Facility 
in Lanzhou (HIRFL). The !2C ions were of effective linear en- 
ergy transfer (LET) value of 1.8 MeV-cm?/mg. Table 1 shows 
the information of SRAMs under test. The IS2ME is 2M- 
bit SRAM organized as 131072 words by 16 bits with ECC, 
65nm process node; the IS4ME is 4M-bit SRAM organized 
as 262 144 words by 16 bits with ECC, 65 nm process node; 
the IS2M is 2M-bit SRAM organized as 131072 words by 
16 bits without ECC, 150 nm process node; and the IS4M is 
4 M-bit SRAM organized as 262 144 words by 16 bits without 
ECC, 130nm process node. The first two SRAMs with ECC 
are the main objects of observation, and the other two are the 
contrastive devices. All of the four industrial SRAMs belong 
to the IS61WV series made by ISSI company, and the ECC 
functions described in this application are made by Hamming 
code, a relatively simple yet powerful ECC which can correct 
all single bit errors in one codeword. 

The SRAMs were tested using data pattern of all “1” (blan- 
ket pattern) at voltage of 3.3 V, and the work period was set 
at 20 MHz all the time. Under the static test mode, the de- 
vices were written prior to their beam-shot and read periodi- 
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TABLE 1. The information of SRAMs under test 


TONG Teng et al. 
Device Process node (nm) 
61WV12816EDBLL-10TLI 65 
61WV25616EDBLL-10TLI 65 
61WV12816DBLL-10TLI 150 
61WV25616BLL-10TLI 130 


cally throughout the beam shot (this technique is often referred 
to as multiple-read) [1, 8, 9]. The error data occurred in the 
test were stored in another RAM (referred as mirrored RAM 
relative to the SRAM under test) working in the test system, 
as a referenced data for next read cycle. The test flow ap- 
plied (Fig. 1) distinguishes SBU, MBU and SEL. All the upset 
events were recorded with a timestamp and bitmap location. 


Static Test Flow 


Initialize DUT using 
logical blanket vector 


Read DUT vector 
and Count errors 


Irradiation 


Finish 


Fig. 1. Static test flow. 


HI. RESULT ANALYSIS AND DISCUSS 
A. The high efficiency of ECC module 


SEU cross sections of the four SRAMs are shown in Fig. 2. 
One sees that the SRAMs without ECC module are much more 
sensitive to the irradiation than the devices with ECC module. 
The SEU cross sections of SRAMs without ECC module are 
at the order of 107° cm?/bit, while they are 1071! cm?/bit for 
SRAMs with ECC module. However, the technology of pro- 
ducing the IS2ME and IS4ME in 65nm process is more ad- 
vanced than IS2M (150 nm process node) and IS4M (130 nm 
process node). With technology scaling, the number of upsets 
per chip increases due to higher circuit density and sensitivity. 


Capacity (Mbits) ECC Abbreviation 
2 with IS2ME 
4 with IS4ME 
2 without IS2M 
4 without IS4M 


Therefore, the sharp contrast of the two datum groups should 
be attributed to the high efficiency of ECC module. 
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Fig. 2. Cross sections of four SRAMs. 
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Fig. 3. The bits per upset event distribution. 


B. The ineffectiveness of ECC module 


Only 1 bit upsets in a data word were detected in the devices 
without ECC module in this experiment. The upset events in- 
volving 1, 2 and 3 bits errors occurred in the devices with ECC 
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module. Fig. 3 shows the measured and theoretical results of 
bits per upset event distribution (percentages over total events). 
We will discuss the results with an emphasis: special attentions 
shall be paid to the word “upset” and “error” in the following 
text— “upset” is the real change occurred in memory cell, and 
“error” is the data being read out from the memory finally. 


1. The fundamental reason 


For discussing the experimental results, we have the follow- 
ing assumptions: 


1. Considering the beam energy of !?C and the bits inter- 
leaving architecture, the normal incidence ion beams do 
not affect the adjacent memory cells simultaneously. So, 
MBUs are not supposed to occur in a codeword any time 
in this experiment [2, 5-7]. 


2. The static mode used in this test meams that only one 
write operation worked in a test cycle, while the ECC 
module does not correct or re-write the memory it- 
self [1], but just corrects the “error” bit(s). When the 
data be read out through ECC module, the memory re- 
mains in upset status until a new write command arrives 
with new data. Therefore, if other bit(s) upset occurs 
in the same word, the ECC module utilizing Hamming 
code, which can only correct one bit error, will lose 
function. So, the disablement of ECC module is an ac- 
cumulation effect caused by several SBUs in a word at 
different time. On the other hand, as ECC functional 
block diagram (Fig. 4, presented in the datasheet of the 
devices with ECC module) shows, the circuit structure 
of ECC module utilizes the (12, 8) Hamming code in 
the application. 


Based on time structure of the cyclotron and the upstream 
scanning magnets, the incident ions are of uniform temporal 
and spatial distribution in the used flux range, thus each SBU 
could be deemed as an independent random event. 

In independent random event, if the upset probability is 
p(p < 1), the probability that r bit(s) upset occurs in an n bits 
codeword is P a(r) = Crp"(1—p)"" ~ AP” From 
the results of IS2M and IS4M, about 200 ions could cause 1 bit 
upset in order of magnitude, assuming this probability is suit- 
able for IS2ME and IS4ME, we have p = 5 x 10-3. Then, the 
probability of two and three SBUs occurring at different time 
in one codeword is 


Py9(2) = Chp?’ (1 — p)? 


12! oa 
© ap ax 10) (1) 
= 3.3x 1077. 


The probability of three SBUs occurs in different time in one 
codeword is 


P,2(3) = Chp? (1 — p)?’ 
12! -gi 
“a-a JË x 107%) 2) 
= 1.1 x 107ê. 
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The results of Eq. (1) and Eq. (2) show a probability differ- 
ence of two orders of magnitude between r = 2 and r = 3. 
Thus three or more SBUs occur at different time in one code- 
word is of very low probability, hence their omission in this 
experiment. 

Therefore, the fundamental reason for the problem is that 
a 2bits upset in a codeword causes the disablement of ECC 
module utilizing (12, 8) Hamming code. 


2. Parsing the problem 


Figure 5 is a basic memory architecture of ECC module uti- 
lizing Hamming code [10]. Table 2 is a common relationship 
between syndrome vector and single-error location. 


TABLE 2. The relationship between syndrome vector and single- 
error location 


S3S2S1 S0 Error location S3S2S1S0 Error location 

0001 Po 1000 P3 

0010 Pi 1001 Da 

0011 Do 1010 Ds 

0100 P2 1011 De 

0101 Dı 1100 D7 

0110 D2 — — 

0111 D3 0000 No error 
Assuming the 12 bits codeword is 


DzD6D5D4P3D3D2DıP2DoP1Po, 8 bits data word is vector D 
and 4 bits check word is vector P, the syndrome vector $ can 
be generated by data word and check word as [11]: 


So = Do © Di © D3 © Da © De © Po; 
Sı = Do Dz D3 9 D; $ De G Pi; 
So =D; © D2 © D; 9 Dz & Pa; 
S; = D4 @ D; @ De S D7 È P3; 


(3) 
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and the corresponding (12, 8) parity matrix is 


(5) 
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Fig. 4. Functional block diagram of SRAMs with ECC module. 
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Fig. 5. Basic memory architecture for ECC module utilizing Hamming code. 


When an 8-bits data word is written in SRAM, the ECC 
module generates a 4-bits check word to compose 12-bits 
codeword and store it in the memory cell. After irradi- 
ation, when the data word is read out from memory cell 
through the ECC module, which generates a syndrome vector 
S=(S3S2S1S0), according to the codeword. 


In Eq. (5), each column vector in parity matrix represents the 
position of each bit (D,,(u = 0,1,...7) or P,(@v = 0,1, 2, 3)) 
in the codeword, 0 means that the bit does not participate in 
the form of Sẹ (k = 0,1, 2,3), 1 means that the bit participates 
in the form of Sẹ (k = 0,1,2,3). Then, how does the 2-bits 
change in codeword generate S (0000), and how does the S 
point to an error in Table 2? The method to find the failure 
modes is discussed as follows: 


1. Neither the 2 upset bits participate in the Sk, Sk =0@ 
0 = 0 to point to “no error”. 


2. Both the 2 upset bits participate in the Sp, Sk = 19 1 = 
0 to point to “no error”. 


3. Only one upset bit participates in the S, Sk =160=1 
or S = 0@ 1 = 1, the value of the corresponding S 
is always 1, so the ECC module spots an “error” and 
makes a “correct” operation. 


Consequently, the S% value is associated with the status of 
2 upset bits participating in the S;, and the relationship is a 
“XOR” operation between S, and the 2 upset bits. 

For example, if the 2-bits upset comes from D3P9, they will 
not affect the value of So (as both participate in it) and S3 (as 
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neither participate in it). However, Sı = Pı Do 6D2 6 D3 4 
D; © De and Sp = P2 6D; S D2 S Dy G D7 will result in 
S=(S38281S0)= (0110), which can be understood simply as: 


So 1 1 0 
Si] J1 oļ lı 
g| =li elo = lil (6) 
S3 0 0 0 


Eq. (6) points to an “error” position at Də by Table 2, then ECC 
module corrects the right value of Də to an error value, while 
the real upset bit D3 is read out as an “right” data, leading a 
2-bits errors as D3D2. In other words, the data written in is 
“FF”, and the data read out is “F3” as an error to be detected. 

Therefore, the problem-solving method can be simplified as 
the following procedures: 1) extract two columns vectors (2- 
bits upset occurring in the same bit in a codeword does not 
affect the S, hence the omission of this condition) from par- 
ity matrix of Eq. (5), 2) make an “XOR” operation with them 
as Eq. (6), 3) produce the syndrome vector S, 4) find the “er- 
ror” position where S points to, 5) analyze the relationship be- 
tween the “error” and the “upset”, and 6) a statistics of failure 
modes including 1-bit, 2-bits and 3-bits errors read out from 
the SRAMs can be achieved. 


TABLE 3. Message of the failure modes of ECC module of a 2-bits 
upset with both upsets occurring in check word 


Upset S3S2- “Error” position Error read Error types 

position S1So the S points to out (bits involved) 
Pi Po 0011 Do Do 1 bit 
P2Po 0101 Di Di 1 bit 
P3Po 1001 Da D4 1 bit 
PoP, 0110 D2 D2 1 bit 
P3P 1 1010 Ds Ds 1 bit 
P3P2 1100 D7 D7 1 bit 


3. Analysis results 


Extracting two columns of vector from parity matrix of 
Eq. (4), the total number of error types is C?, = 66. Tables 
3- 5 list details of the failure modes and error types. 


1. When 2-bits upset are both in chech word (Table 3) 


In this case, the ECC module makes a wrong operation, 
the number of error types is C? = 6, all the failure mode 
is 1-bit. 


2. When 1 bit upset in check word, 1 bit upset in data word 
(Table 4) 


In this case, the ECC module would makes a wrong op- 
eration, the number of error types is CC} = 32, of 
which the number of 1-bit is 20, the number of 2-bits is 
12, and the failure modes includes 1-bit and 2-bits. 


3. When 2 bits upset are both in data word (Table 5) 


In this case, the ECC module makes a wrong operation, 
the number of error types is C? = 28, of which the num- 
ber of 2-bit is 13, and the number of 3-bit is 15, and the 
failure modes includes 2-bits and 3-bits. 
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TABLE 4. Message of the failure modes of ECC module with 1 bit 
upset in check word and 1 bit upset in data word 


Upset S3S2 “Error” position Error read Error types 
Position S1So the S points to out (bits involved) 
DoPo 0010 Pi Do 1 bit 
DıPo 0100 P2 Dı 1 bit 
D2Po 0111 D3 D3D2 2 bit 
DsPo 0110 D2 DsD2 2 bit 
D4Po 0001 P3 D4 1 bit 
DsPo 1011 De DeDs 2 bit 
De6Po 1010 Ds DeDs 2 bit 
D7Po 1101 no point D7 1 bit 
DoPi 0001 Po Do 1 bit 
Di Pi 0111 D3 D3D, 2 bit 
DePi 0100 P2 D2 1 bit 
D3Pi 0101 Dı DsDi 2 bit 
DsPy 1011 De D6D4 2 bit 
DsPi 1000 P3 Ds 1 bit 
DeeP 1001 D4 D6 D4 2 bit 
D7Pi 1110 no point D7 1 bit 
DoP2 0111 D3 D3Do 2 bit 
DıP2 0001 Po Dı 1 bit 
D2P2 0010 Pi D2 1 bit 
DsP2 0011 Do DsDo 2 bit 
D4P2 1101 no point D4 1 bit 
D5P2 1110 no point Ds 1 bit 
De6P2 1111 no point De 1 bit 
D7P2 1000 P3 D7 1 bit 
DoP3 1011 De DeDo 2 bit 
DiP3 1101 no point Di 1 bit 
D2P3 1110 no point D2 1 bit 
D3P3 1111 no point D3 1 bit 
D4P3 0001 Po D4 1 bit 
DsP3 0010 Pi D; 1 bit 
DeP3 0011 Do D6Do 2 bit 
D7P3 0100 P2 D7 1 bit 


Therefore, the total number of 1-bit is 6 + 20 = 26, the 
probability in all error types is 26/66 = 39.39%; the total 
number of 2-bits is 12 + 13 = 25, the probability in all error 
types is 25/66 = 37.88%; and the total number of 3-bits is 15, 
the probability in all error types is 15/66 = 22.73%. Table 6 
shows the theoretical probabilities of failure modes including 
1-, 2- and 3-bits agree well with the experimental results. 

Therefore, the immanent factor of failure modes of ECC 
module in this experiment is due to the failure of (12, 8) Ham- 
ming code facing to 2 bits upset in one codeword. 


IV. CONCLUSION 


The results show the effectiveness and ineffectiveness of 
ECC module utilizing (12, 8) Hamming code in 65 nm process 
node SRAMs. The ECC module works obviously in hardening 
the advanced process node SRAMs. The failure modes includ- 
ing 1-, 2-, and 3-bits in a data word has been analyzed, and the 
essential factor of failure modes is due to the failure of (12, 8) 
Hamming code facing to 2 bits upset in one codeword. The 
measured bits per upset event distribution agree well with the- 
oretical calculation. 
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TABLE 5. The message of the failure modes of ECC module when 
2 bits upset occur both in data word 


Upset S3S2 “Error” position Error read Error types 
position SiSo the S points to out (bits involved) 
D,Do 0110 De D2D,Do 3 bit 
D2Do 0101 Dı D2D:ı Do 3 bit 
D3Do 0100 P2 DsDo 2 bit 
D4Do 1010 Ds DsD.4Do 3 bit 
DsDo 1001 Da DsD.aDo 3 bit 
DeDo 1000 P3 DeDo 2 bit 
D7Do 1111 no point D7Do 2 bit 
D2D, 0011 Do D2D,Do 3 bit 
D3Di 0010 Py DsDi 2 bit 
D4Dı 1100 D7 D7D4Dı 3 bit 
DsDi 1111 no point Ds5Di 2 bit 
D6Di 1110 no point DeDi 2 bit 
D7D; 1001 D4 D7-D4D: 3 bit 
D3D2 0001 Po DsD2 2 bit 
D4D2 1111 no point D4D2 2 bit 
D;D2 1100 D7 D7DsD2 3 bit 
DegD2 1101 no point DegD2 2 bit 
D7D2 1010 Ds D7D;sD2 3 bit 
D4D3 1110 no point D4D3 2 bit 
DsD3 1101 no point DsD3 2 bit 
D6D3 1100 D7 D7De6D3 3 bit 
D7D3 1011 De D7De6D3 3 bit 
DsD4 0011 Do DsD.4Do 3 bit 
D6D4 0010 Py DoD. 2 bit 
D7D.4 0101 Dı D-D4D: 3 bit 
DeDs 0001 Po DeDs 2 bit 
D7Ds 0110 D2 D7DsD2 3 bit 
D7De 0111 D3 D7De6D3 3 bit 
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TABLE 6. The measured and calculated probabilities of the failure 
modes including of 1-, 2- and 3-bits 


Error Number of erros Probability of error 
types measured Measured Theoretical 
1 bit error 119/294 40.48% 39.39% 
2 bits error 111/294 37.76% 37.88% 
3 bits error 64/294 21.77% 22.73% 


There can be several mitigation approaches if a much higher 
reliability is required. Periodic memory scrubbing is often 
used to improve the performance of the device. and a scrub- 
bing operation will be conducted in the SRAMs exposed to 
heavy ions in our lab, so as to observe the relationship be- 
tween the scrub-rates and the bit error rate (BER). If more re- 
dundancy is accepted, the triple-bit-correcting Golay code or 
the Triple Modular Redundancy (TMR) may be employed. 

The research on 65 nm SRAMs may provide a reference to 
the manufacturers in their choice of the reinforcement model 
and algorithm, and to the users in their selection of device ap- 
plication environment and methods. 
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